Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip flaky test TestSQSReceiver/ReceiveMessage_success #42245

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Jan 7, 2025

Proposed commit message

See title

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Disruptive User Impact
## Author's Checklist
## How to test this PR locally

Related issues

## Use cases
## Screenshots
## Logs

@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 7, 2025
@belimawr belimawr self-assigned this Jan 7, 2025
@belimawr belimawr requested a review from a team as a code owner January 7, 2025 17:52
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 7, 2025
Copy link
Contributor

mergify bot commented Jan 7, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 7, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 7, 2025
@jlind23
Copy link
Collaborator

jlind23 commented Jan 8, 2025

@bturquet can we get someone from your team to review this?

@Kavindu-Dodan
Copy link
Contributor

Kavindu-Dodan commented Jan 8, 2025

@bturquet can we get someone from your team to review this?

I had a look into the failing test and the logic it's testing. IMO, this test is important as it validates SQS processing from end to end with mocked processors. So we should not disable this test.

The flakiness comes from this gomock stub 1. And the root cause seems to be,

  • We perform a context cancel to exit the processing and test once we complete SQS message processing and register a call to DeleteMessage 2
  • This ctx cancellation has a race condition with readSQSMessages 3 which internally calls ReceiveMessage

I think we can fix the flakiness by simply changing Times(1) to AnyTimes() 4. This allows us to have zero or more calls to the read messages.

Footnotes

  1. https://github.com/elastic/beats/blob/v8.17.0/x-pack/filebeat/input/awss3/sqs_test.go#L59-L63

  2. https://github.com/elastic/beats/blob/v8.17.0/x-pack/filebeat/input/awss3/sqs_test.go#L75

  3. https://github.com/elastic/beats/blob/v8.17.0/x-pack/filebeat/input/awss3/sqs_input.go#L160C11-L160C26

  4. https://github.com/elastic/beats/blob/v8.17.0/x-pack/filebeat/input/awss3/sqs_test.go#L61

Copy link
Contributor

@Kavindu-Dodan Kavindu-Dodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we enable the test again and go with the the proposed change of this comment - #42245 (comment) ?

@belimawr
Copy link
Contributor Author

Shall we enable the test again and go with the the proposed change of this comment - #42245 (comment) ?

From the data-plane perspective, as long as the test is not flaky/is blocking CI, I don't see any issues in re-enabling it.

However, the solution you suggested seems to fix the symptom, not the root cause. I believe a better fix would be to fix the race condition. I don't know the internals of this input here, so I might be wrong 🙈

Anyways, feel free to open a PR with the fix you believe it's best and removing the t.Skip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants